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ABSTRACT 

Information retrieval (IR) system is about positioning 
reports utilizing client's question and get the important 
records from extensive dataset. Archive positioning is 
fundamentally looking the pertinent record as per 
their rank. Document ranking is basically search the 
relevant document according to their rank. Vector 
space model is traditional and widely applied 
information retrieval models to rank the web page 
based on similarity values. Term weighting schemes 
are the significant of an information retrieval system 
and it is query used in document ranking. Tf-idf 
ranked calculates the term weight according to users 
query on basis of term which is including in 
documents. When user enter query it will find the 
documents in which the query terms are included and 
it will count the term calculate the Tf-idf according to 
the highest weight of value it will gives the ranked 
documents. 
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I. INTRODUCTION 

In the information retrieval (IR) system documents 
are ranked optimally by using user’s query to find out 
the relevant documents from large data base or form 
dataset [21].When the user gives a query, the index is 
consulted to archives the most relevant documents. 
The relevant documents are then ranked significance 
of their degree of relevance. Majority of internet users 
rely on search engines for extracting information by 
providing a query from any large dataset. These 
queries are processed by the search engines and a 
certain information retrieval or mining algorithm is 
applied to obtain the cluster of documents related to 
the query. After the retrieval of these documents, an 
important task is to present these documents in a list 
where documents at the top are the ones considered 
more relevant for the user. This task is called ranking 
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of documents [15].Information retrieval system is a 
set of documents to discover convenient information 
equivalent to a user’s query. In information retrieval 
basically data can be fetching from web structure 
information that can be type of content, pictures, 
graph etc. Several components make this task 
challenging :(i) normally unstructured information is 
in document database; (ii) reports are typically 
composed in unconstrained characteristic dialect; iii) 
regularly, the documents cover extensive variety 
range of subjects. There are various information 
model used .one of them is vector space model. It is a 
model for representing text documents or any other 
items as vectors of identifiers [17]. It is utilized as a 
part of information filtering, information retrieval, 
indexing and relevancy rankings, relevance rankings 
of documents in a keyword search can be calculated, 
using the suppositions of document equivalence 
theory, by comparing the deviation of angles between 
each document vector and the main query vector 
where the query is represented as the equivalent of 
vector as the documents. The vector space model 
technique can be partitioned into three stages. The 
main stage is the document indexing where content 
relevance terms are extracted from the document text. 
The second stage is the weighting of the indexed 
terms to enhance retrieval of documents relevant to 
the user. In the last stage, rank of the documents 
archives as per the query comparability value [4] [7], 
Documents and queries are shows as vector 

dj = (wij,w 2J ' . w tJ ) 

Q = ( w l,q, w 2,q, . w n,q,) (1) 

There are dissimilar techniques for computing these 
values, which are known as (term) weights, have been 
produced. One of the best known schemes is Tf-idf 
weighting. A main advantage is that it is simple model 
based on linear algebra, ranking documents according 
to their possible relevance. Their term-weighting 
schemes enhance retrieval performance. Its partial 
matching procedure permits retrieval of documents 
that approximate the query conditions. 
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TF-IDF VECTOR SPACE MODEL 

In Information retrieval Tf-Idf is kn own as term 
frequency and inverse document frequency. It is a 
co mm on method to assess how a word is required a 
document. It is commonly used as weighting factor in 
information retrieval. Tf-Idf is also a very interesting 
method to convert the textual representation of 
information into a Vector Space Model (VSM). The 
weight of term in document vector can be determined 
using method [14]. The weight of term is measured 
sometimes term j obtain in the document i (the term 
frequency) and tdf (the inverse document frequency) 
[7]. The weight of a term j in the document i is given 
by 

w i,j =freqij xicif 

And idf i = log — 

° n i 


II. MOTIVATION 

Research in document ranking is motivated not only 
by the challenges that are system faces but there are 
various reasons are listed below which motivated to 
do research in this area. The nature of most data on 
the Web is so unstructured that they can only be 
understood by humans, but the amount of data is so 
huge that they can only be processed efficiently by 
machines. From that large amount of data its difficult 
to find the relevance document according to user 
requirement. Different model used for information 
retrieval among of that vector space model is used for 
the information retrieval, indexing and ranking. Tf- 
Idf, term frequency-inverse document frequency 
ranker is a popular mechanism to calculate the 
relevance of very large documents. In traditional Tf- 
Idf ranker would calculate the weight of each 
document with respect to the words from the given 
query. This technique used when the words from the 
document is shared . But, due to the richness of 
natural language, a query can be expressed in 
different ways by different users. This mechanism is 
in which it retrieves the concepts which are most 
similar to the query from user. 
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III. PROPOSED SYSTEM 

The proposed system is about ranking of different 
type of documents. Ranking is the process of ordering 
a set of items in order to show the most relevant first. 
In Ranking is the core of an Information Retrieval 
system because we need to know in what order to 
present the returned documents to the user. We are 
going to develop document ranking method for 
research data. Here use Vector space model for 
ranking the document, generally ranking method 
doesn’t include phrase word at time of for selection. 
In our proposed method we will decide phrase word 
and also synonyms as term and we calculate TF-IDF 
to generate vector matrix. Tf-idf ranked calculates the 
term weight according to users query on basis of term 
which are include in documents. When user enter 
query it will find the documents in which the query 
terms are included and it will count the term calculate 
the TF-IDF according to the highest weight of value it 
will gives the document ranked . 

ALGORITHM STEPS: 

Input : User Query 
Output : Ranked Documents 

Parameters : 

QTL : Query term list 

Re : Reading Counter 

TF : Term Frequency 

IDF : Inverse Document Frequency 

Step 1 : Read the Query/Read the Document 

Decide format of Documents (Pdf, Docx, Txt 
etc.) 

Step 2 : Read the documents Text and Remove Stop 
Words (is, are, that etc.) 

(SMARTWORD List API) 

Step 3 : Inialize QueryTermList (QTL/DTL 

(Document term list), Phrase Wordlist = null, 
matrix[N][QTsize] to 0, 

Reading Counter to 0 
Step 4 : Spilt and add each term to QTL 

Find the Phrase Word in Query(E.g. HOT 
DOG) 

Add Phrase words to Phrase Wordlist 
Step 5 : Remove the Single Words used in Phrase 
word from QTL 

Append Phrase Word List to QTL 

Find the synonyms of each term using word 

net API (Eg.BIG=HUGE) 
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Step 6 


Step 7 
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slf synonyms not in list then add to list Step 8 : Calculate TF and IDF [m][n] 

Get the Size of Document List N (Metrics 

Rows) (N no of rows and Tf is Step 9 : Depend on the maximum value Rank the 
Colum)[m][n] Document 

While Re <N 

& set Document Term List to Step 10 : Update Index 


null 

For each Term in Document Term List 
Update the Matrices based on TF 
End for 

end while 



NO 


IV. EVALUATION MEASURES 

Document ranking can be valued on various measures 
few of them are discussed in this section. 


1) Precision - Precision measures how many of the 
items that the system identified were actually 
correct, regardless of whether it also failed to 
retrieve correct items. [24] 

Precision = tp / (tp + fp) 
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where, 

tp = True Positive, case was positive and 
predicted positive 

fp = False Positive, case was negative but 
predicted positive 

2) Recall - Recall measures how many of the items 
that should have been identified actually were 
identified, regardless of how many spurious 
identifications were made. [24] 

Recall = tp / (tp + fn) 

where, 

tp = True Positive, case was positive and 
predicted positive 

fn = False Negative, case was positive but 
predicted negative 

3) F-Measure - The F-measure is often used in 
conjunction with Precision and Recall, as a 
weighted average of the two. 

F-Measure = 2 * (P * R) / (P + R) 

where, 

P= Precision 
R=Recall 

4) Accuracy - Accuracy is the proportion of true 
results (both true positives and true negatives) 
among the total number of cases examined. 

Accuracy = (tp + tn) / (tp + tn + fn + fp) 

where, 

tp = True Positive, case was positive and 
predicted positive 

tn = True Negative, case was negative and 
predicted negative 

fn = False Negative, case was positive but 
predicted negative 

fp = False Positive, case was negative but 
predicted positive 


EXPERIMENTAL RESULTS 

To measure the performance of the proposed system 
the parameters available are discussed in the above 
section. It has been compared with the Time 
Complexity of ranked document. That is shown in 
Figure 5.1 
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Fig. Evaluation Measures Result 

Above figure shows the evaluation measures 
Precision, Recall ,F-measure and accuracy. Accuracy 
minimum time is 0.6 and in that time query will be 
exacting and gives the ranked documents. 

V. LITERATURE SURVEY 

In 2013 the researcher Jiaul H. Paik is represents a 
novel TF-IDF term weighting scheme. The suggest 
term weighting scheme has two feature of within 
document term frequency assign to discover the 
importance of a term. Experiments done at the huge 
amount of TREC news and web collection data and 
proposed that the out performs five state of the art 
retrieval model is significance and consistent. And its 
shows that proposed model better than the existing 
models [6]. 

T.Suganya and M.Ravichandran proposed a method in 
e-leaming rank ordering the documents according to 
their individual term relevance degree using 
possibility approach and vector-based technique 
method. This proposed system provides highly 
relevant learning materials to the learner and it 
recommends the items based on individual term 
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relevance with respect to the query specified by the 
user [5], 

In proposed vector space model used in XML 
document ranking suggest by Weimin He and Teng 
Lv. They proposed effectively rank Xml document 
and also differentiate the framework with Lucene to 
demonstrate their extended TF*IDF is successful and 
it is effective ranking than existing XML search 
engine Lucene [15]. 

Premalatha.R and Srinivasan.S emphasis, on 
Information retrieval for Tamil literary document 
using the model vector space, they approaches in text 
processing in information retrieval. In their system 
explore that can be divided into three categories, Main 
topic search, Subtitle search and Keyword search. So 
the system would explore necessitate information 
rapidly mainly using the vector space model, that 
illustrate documents as vectors. It would be applicable 
for all Tamil literates and understudies to look and 
learn [2] 

In their research Vaibhav Kant Singh and Vinay 
Kumar Singh is proposed a vector space model using 
in information retrieval .In that individual document 
and user query is represented as a vector based against 
the vocabulary and Calculating similarity measure and 
than Ranking the documents for relevance and other 
variant of VSM that Term weighting, Normalized 
term frequency(tf) and Inverse document frequency 
(idf) is shows in their system[l]. 

Bo Yu and Guoray Cai is recommend a dynamic 
document ranking scheme join thematic and 
geographic pertinence measures on a for each query 
premise. They have been using Dempster-Shafer’s 
theory to gather the two different sources of ranking 
verification and evaluate the different web document 
data set. Which can be either news stories or blog and 
it can be fetch from the web data [19]. 

Dik Lun Lee, Huei Chuang and Kent E.Seamons is 
proposed that Using various interpretation of the 
vector-space model for text retrieval queries, they 
optimal balance between processing efficiency and 
retrieval effectiveness as expressed in relevant 
document rankings. They using six different vector 
method and their retrieval effectiveness [20]. 
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VI. CONCLUSION 

In this paper we conclude that on the basis of query 
document ranking is utilized for search relevant 
document so that information retrieval is a process of 
searching and retrieving the knowledge based 
information from collection of documents, for that 
their distinctive model is used with its advantages, 
documents are comparing with the input query. 
Vector space model is used in information filtering, 
information retrieval and relevancy ranking of 
documents, ranked first. Tf-idf ranked calculate the 
term weight according to users query on basis of term 
which are include in documents. When user enter 
query it will find the documents in which the query 
terms are included and it will count the term calculate 
the TF-IDF according to the highest weight of value it 
will gives the document ranked 
Future research include variants of Tf-idf and 
weighting of query word can be find and in future 
using special character also used in query searching.. 
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